Using Inverse Probability Bootstrap Sampling to Eliminate Sample Induced Bias in Model Based Analysis of Unequal Probability Samples
نویسندگان
چکیده
In ecology, as in other research fields, efficient sampling for population estimation often drives sample designs toward unequal probability sampling, such as in stratified sampling. Design based statistical analysis tools are appropriate for seamless integration of sample design into the statistical analysis. However, it is also common and necessary, after a sampling design has been implemented, to use datasets to address questions that, in many cases, were not considered during the sampling design phase. Questions may arise requiring the use of model based statistical tools such as multiple regression, quantile regression, or regression tree analysis. However, such model based tools may require, for ensuring unbiased estimation, data from simple random samples, which can be problematic when analyzing data from unequal probability designs. Despite numerous method specific tools available to properly account for sampling design, too often in the analysis of ecological data, sample design is ignored and consequences are not properly considered. We demonstrate here that violation of this assumption can lead to biased parameter estimates in ecological research. In addition, to the set of tools available for researchers to properly account for sampling design in model based analysis, we introduce inverse probability bootstrapping (IPB). Inverse probability bootstrapping is an easily implemented method for obtaining equal probability re-samples from a probability sample, from which unbiased model based estimates can be made. We demonstrate the potential for bias in model-based analyses that ignore sample inclusion probabilities, and the effectiveness of IPB sampling in eliminating this bias, using both simulated and actual ecological data. For illustration, we considered three model based analysis tools--linear regression, quantile regression, and boosted regression tree analysis. In all models, using both simulated and actual ecological data, we found inferences to be biased, sometimes severely, when sample inclusion probabilities were ignored, while IPB sampling effectively produced unbiased parameter estimates.
منابع مشابه
Unbiasing the Bootstrap—Bootknife Sampling vs. Smoothing
Bootstrap standard errors are generally biased downward, which is a primary reason that traditional bootstrap confidence intervals have coverage probability which is too low. For the sample mean the downward bias is a factor of n−1 n (for the squared standard error); the same bias holds approximately for asymptotically-linear statistics. In the case of stratified or two-sample bootstrapping, th...
متن کاملBootstrap Procedures for the Pseudo Empirical Likelihood Method in Sample Surveys
Pseudo empirical likelihood ratio confidence intervals for finite population parameters are based on asymptotic χ2 approximation to an adjusted pseudo empirical likelihood ratio statistic, with the adjustment factor related to the design effect. Calculation of the design effect involves variance estimation and hence requires second order inclusion probabilities. It also depends on how auxiliary...
متن کاملBootstrap Asymptotics
The bootstrap, introduced by Efron (1979), merges simulation with formal model-based statistical inference. A statistical model for a sample Xn of size n is a family of distributions {Pθ,n : θ ∈ Θ}. The parameter space Θ is typically metric, possibly infinite-dimensional. The value of θ that identifies the true distribution from which Xn is drawn is unknown. Suppose that θ̂n = θ̂n(Xn) is a consis...
متن کاملCombining Multiple Imputation and Inverse-Probability Weighting
Two approaches commonly used to deal with missing data are multiple imputation (MI) and inverse-probability weighting (IPW). IPW is also used to adjust for unequal sampling fractions. MI is generally more efficient than IPW but more complex. Whereas IPW requires only a model for the probability that an individual has complete data (a univariate outcome), MI needs a model for the joint distribut...
متن کاملSampling Considerations for Disease Surveillance in Wildlife Populations
Disease surveillance in wildlife populations involves detecting the presence of a disease, characterizing its prevalence and spread, and subsequent monitoring. A probability sample of animals selected from the population and corresponding estimators of disease prevalence and detection provide estimates with quantifiable statistical properties, but this approach is rarely used. Although wildlife...
متن کامل